Code Translation

Translation is the process of turning source code into object code

Various special programs are involved in code translation

Older languages (eg. C, C++) use a compiler and linker in the traditional manner
Newer languages (eg. Java, Python) use both a compiler and interpreter together

Types of Code

![[Pasted image 20250519090611.png]]

Source code is portable (platform independent)

Compilers vs Interpreters

Compilers

Comparing Translation Methods

Each method has advantages and disadvantages

Compilers Interpreters
Translate entire file (with context) Translate line by line (no look ahead)
Compilation process is fast Interpretation is slow
Compiled code runs much faster Interpreted code runs much slower
Run once (to make executable code) Run every time source file is 'executed'
Check for syntax and semantic errors Can only check for syntax errors
Programming tools can be complex Programming environment is easier
Syntax errors hard to locate by line Syntax errors easier to identify in code
Need careful organisation of source code Useful for rapid prototyping
Debugging can be complicated Debugging tools are easier to use
Generated object code is platform locked Source code is portable across platforms

New Languages

New programming languages are regularly created to solve niche problems

Javac & Java

Java uses both a compiler (javac) and interpreter (java)

Programmer's System:

Some other languages use a similar approach but it all happens on the user's system

Types of Compiler

Early compilers were single pass

Linking

Can be many source files each with a corresponding object file
Source files also reference library code which exists in multiple object files
The linker pulls all these together into the final executable program image

![[Pasted image 20250519090647.png]]
Note: Difference between static and dynamic linking

Components of Programming Languages

Language Grammars

A precise description of the programming language is needed to the translator can:

Syntax Diagrams

Grammar rules can be drawn as a set of diagrams
![[Pasted image 20250519091205.png]]
Expressions are formed from one or more tokens

Extended Backus-Naur Form

The translator uses a more precise grammar called extended Backus-Naur form (EBNF)
Defines production rules from non-terminal rules and terminal symbols
![[Pasted image 20250519091347.png]]
Terminal symbols are defined in quotes
The comma character concatenates things to build the rule from smaller parts
EBNF uses symbols to mean certain things (eg. curly brackets), so if those symbols are to be matched they must be in quotes (as terminal symbols)

Every aspect of the language must be defined in this way, so there will be hundreds of rules for any reasonably complex programming language (Java has around 120)

EBNF Notation (ISO Version)

There are several types of EBNF, however ISO EBNF is used here

Syntax Meaning
= Defines a rule
| OR
, Concatenation
rule-name Non-terminal rule reference
"symbol" Terminal symbol
[item] Optional item (zero or one occurence)
{item} Repeated item (zero or more occurences)
(items) Grouped items
{item}- At least one repeated item (one or more occurences)
n*item Repeated item (exactly n times)
; Rule terminator

EBNF Examples

Grammar rules for numbers can be defined

digit        = "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9";
unsigned     = {digit}-;
integer      = unsigned | "+", unsigned | "-", unsigned;
fraction     = ".", unsigned;
decimal      = unsigned | fraction | unsigned, fraction;

Valid variable names can be defined

lower    = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" |
           "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z";
upper    = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" |
           "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z";
letter   = lower | upper;
varname  = letter, { letter | digit };

Eg. a variable must start with a letter, followed by any quantity of letters and digits in any combination

EBNF Matching

Given a string of characters, you can work our whether there is a valid match

varname = letter, { "_" | letter | digit };

Or also allow just one underscore at the start of a variable name:

varname = [ "_" ], letter, { "_" | letter | digit };

A compiler will try to match the source code against the EBNF grammar of the language